LAMP - TR - 129 CS - TR - 4781 UMIACS - TR - 2006 - 06 January 2006 HANDWRITING IDENTIFICATION , MATCHING ,
نویسنده
چکیده
Throughout history, handwriting has been the primary means of recording information that is persevered across both time and space. With the coming of the electronic document era, we are challenged with making an enormous amount of handwritten documents available for electronic access. Though many handwritten documents contain only handwriting, now, more are mixed with printed text, noise, and background patterns. The mixture of handwriting with other components presents a great challenge for making an original document electronically accessible. Many handwritten documents come together with a special background pattern, rule lines, which are printed on the paper to guide writing. After digitization, rule lines will touch text and cause problems for further document image analysis if they are not detected and removed. In this dissertation, we present a rule line detection algorithm based on hidden Markov model (HMM) decoding, achieving both high detection accuracy and a low false alarm rate. After detection, line removal is performed by line width thresholding. Handwriting often mixes with printed text, such as signatures and annotations on a business letter. Handwriting in a printed document often indicates corrections, additions, or other supplemental information that should be treated differently from the main content. The data set we are processing is noisy, which makes the problem more challenging. In this dissertation, we first segment the document at a suitable level, and then classify The support of this research by the US Department of Defense under contract MDA-9040-2C-0406 is gratefully acknowledged.
منابع مشابه
LAMP - TR - 129 CS - TR - 4781 UMIACS - TR - 2006 - 06 January 2006 HANDWRITING IDENTIFICATION , MATCHING , AND INDEXING IN NOISY DOCUMENT IMAGES
Throughout history, handwriting has been the primary means of recording information that is persevered across both time and space. With the coming of the electronic document era, we are challenged with making an enormous amount of handwritten documents available for electronic access. Though many handwritten documents contain only handwriting, now, more are mixed with printed text, noise, and b...
متن کاملCS - TR - 3790 UMIACS - TR - 97 - 40 CLIS - TR - 97 - 06 A Study on Video Browsing Strategies
Due to the unique characteristics of video, traditional surrogates and control/browsing mechanisms that facilitate text-based information retrieval may not work sufficiently for video. In this paper, a video browsing interface prototype with key frames and fast play-back mechanisms was built and tested. Subjects performed two kinds of browsing-related tasks: object identification and video comp...
متن کاملLAMP - TR - 145 CS - TR - 4877 UMIACS - TR - 2007 - 36 HCIL - 2007 - 10 July 2007 Exploring the Effectiveness of Related Article Search in PubMed
We describe two complementary studies that explore the effectiveness of related article search in PubMed. The first attempts to characterize the topological properties of document networks that are implicitly defined by this capability. The second focuses on analysis of PubMed query logs to gain an understanding of real user behavior. Combined evidence suggests that related article search is bo...
متن کاملCS - TR - 4182 UMIACS - TR - 2000 - 64 A SIMULATION ENVIRONMENT FOR EVOLVINGMULTIAGENT COMMUNICATIONSeptember 2000
A simulation environment has been created to support study of emergent communication. Multiple agents exist in a two-dimensional world where they must nd food and avoid predators. While non-communicating agents may survive, the world is conngured so that survival and tness can be enhanced through the use of inter-agent communication. The goal with this version of the simulator is to determine c...
متن کاملLAMP - TR - 119 CS - TR - 4695 UMIACS - TR - 2005 - 04 February 2005 Automatically Evaluating Answers to Definition Questions
Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called Pourpre, for automatically evaluating answers to definition questions. Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether an information nugget appears in a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006